Structural basis of DNA binding by YdaT, a functional equivalent of the CII repressor in the cryptic prophage CP-933P from Escherichia coli O157:H7

YdaT, a functional equivalent of the bacteriophage λ CII repressor in certain lambdoid phages, is a member of the POU-domain family and recognizes a 5′-TTGATTN6AATCAA-3′ inverted repeat.


Introduction
Since the early days of molecular biology, Escherichia coli bacteriophage has served as a model organism and as a tool in molecular genetics. The 'lysis versus lysogeny' decision of the phage has been studied in great detail and serves as a paradigm for regulatory gene circuits (Wegrzyn & Wegrzyn, 2005). The circuit, which depends on three transcription factors (CI, CII and Cro), results in bistability. The interplay between CI and Cro determines whether a prophage remains inserted in the host chromosome or start a lytic cycle (Ptashne et al., 1980;Johnson et al., 1981). CII, on the other hand, is essential in order to enter the lysogenic path immediately after infection (Chung & Echols, 1977). CII is expressed after infection of E. coli with , with its expression level depending on both cellular and environmental factors that affect the half-life of this rather unstable protein (Kobiler et al., 2002). When its expression level crosses a certain threshold, it directs the lysis/lysogeny decision towards lysogeny. CII activates three promotors on the phage genome: P I , P RE and P AQ . From P AQ an antisense RNA is transcribed that prevents production of the Q protein to reduce lytic activity until the repressor CI is sufficiently expressed (Ho & Rosenberg, 1985). CI is transcribed from P RE , resulting in the shutdown of all lytic genes. Finally, from the P I promotor, the Int protein is produced that integrates the phage genome into the E. coli chromosome (Court et al., 2007). Once lysogeny has been established, CII is no longer required and remains turned off.
Lambdoid phages are a family of bacteriophages related to coliphage , with which they can form viable recombinants. Most lambdoid phages grow on E. coli, but a few such as P22 come from Salmonella typhimurium. They are highly polymorphic in DNA sequence and biological specificity, with differences observed in receptor specificity, integration and the mechanism of packaging (Campbell, 1994). Different E. coli strains typically contain several cryptic lambdoid phages in their genomes. For example, E. coli O157:H7 Sakai contains 18 such prophage genome elements that together constitute about 16% of its total genomic DNA content (Brü ssow & Kutter, 2004).
Among the different defective prophages (that are missing one or more crucial genes to allow initiation of a lytic cycle) present in the genome of E. coli O157:H7 is CP-933P (Perna et al., 2001). It contains a three-component version of the parDE type of toxin-antitoxin module termed paaR2-paaA2-parE2 next to the ydaST gene pair (Hallez et al., 2010). The ydaS and ydaT genes were initially suspected (from an analysis using the RASTA-Bacteria algorithm) to constitute a toxin-antitoxin operon in E. coli O157:H7, with YdaT supposedly being the toxin (Sevin & Barloy-Hubler, 2007). Experimental evidence nevertheless failed to prove this hypothesis (Christensen-Dalsgaard et al., 2010). Indeed, their genetic context pinpoints ydaS and ydaT as equivalents of cro and cII, respectively (Casjens, 2003;Jobling, 2018;Jurė nas et al., 2021). Homologs are found in a number of rac prophages, where their expression is repressed by RacR (Krishnamurthi et al., 2017). Rac phages are a group of defective prophages that are found in various E. coli strains, with Rac itself being the first defective prophage to be discovered in E. coli K-12 (Kaiser & Murray, 1979). The CI repressor is here typically referred to as 'RacR'.
A schematic comparison between the immunity regions of bacteriophage and CP-933P is given in Fig. 1. Similar to CII, which binds the P RE promotor located between the cro and cII genes and activates transcription from P RE , YdaT is predicted to bind at the interface between the ydaS and ydaT genes. This region contains the P RE993P promotor, the CP-933P equivalent of P RE . Although basal transcriptional activity from P RE993P is very weak, overexpression of YdaT increases this activity significantly (Jurė nas et al., 2021).
Although both YdaS and YdaT are predicted to contain a helix-turn-helix motif and serve a similar function as the Cro and CII proteins, respectively (Jurė nas et al., 2021), the YdaS and YdaT proteins show no detectable sequence similarity to Cro or CII. YdaT proteins constitute a family of transcription factors that currently remain uncharacterized in terms of structure and DNA-binding activity. In order to better understand how YdaT functions at the molecular level, we determined the crystal structure of CP-933P YdaT and identified its exact binding site as three regions, O L , O M and O R , between the ydaS and ydaT genes. Of these, O M covers the alternative transcription start for the paaR2-paaA2-parE2 operon as identified by Jurė nas et al. (2021). We furthermore created and validated a model for the interaction between YdaT and its operator. Together, our results paint a consistent picture of the functioning of YdaT repressors.

Cloning, expression and purification
Plasmid pET-28b containing the open reading frame for YdaT from E. coli O157:H7 (UniProt ID A0A6M7H0F8) with an N-terminal His tag (GSSHHHHHHSSG) was transformed into competent E. coli BL21 (DE3) cells (Table 1). Transformed cells were plated on agar plates supplemented with kanamycin (25 mg ml À1 ) and incubated at 37 C overnight. LB medium supplemented with kanamycin (25 mg ml À1 ) was inoculated with one colony and left to incubate overnight at 37 C while shaking at 130 rev min À1 . 5 ml of the overnight culture was added to 500 ml LB medium (supplemented with 25 mg ml À1 kanamycin) in 2 l flasks and incubated at 37 C with shaking at 130 rev min À1 . When the OD 600 reached 0.6-0.8,  Comparison of the immunity regions of bacteriophage and E. coli O157:H7 prophage CP-933P. Genes are shown as arrows, with the direction of the arrow indicating the direction of transcription. Promoters of and their CP-933P equivalents are indicated. The three repressors are coloured grey, while other neighbouring phage genes are white. The toxin-antitoxin genes parE2 and paaA2 in CP-933P are coloured black and replace rexB and rexA, respectively, in . The border sequence of the interface between the ydaS and ydaT genes is highlighted and the three YdaT binding sites O L , O M and O R are boxed in grey and labelled. The O M sequence also contains an alternative transcription start for the paaR2-paaA2-parE2 operon that is controlled via YdaT (Jurė nas et al., 2021).
protein expression was induced with 0.5 mM isopropyl -d-1thiogalactopyranoside (IPTG). Upon induction, the cultures were incubated at 37 C for 4 h, centrifuged at 5000 rev min À1 for 15 min, resuspended in lysis buffer [20 mM Tris-HCl, 500 mM NaCl, 20 mM MgCl 2 pH 8.0, 0.1 mg ml À1 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride (AEBSF)] and stored at À80 C. To purify the protein, the frozen cells were thawed and DNase I was added (50 mg ml À1 ). The cells were lysed by sonication (5 min, 5 s on and 5 s off, 70% amplitude) and the lysate was centrifuged at 18 000 rev min À1 for 45 min. The supernatant was filtered (0.45 mm HAWP filter) and loaded onto a pre-packed HisTrap HP Ni 2+ -Sepharose column (GE Healthcare) pre-equilibrated in 20 mM Tris-HCl, 500 mM NaCl, 5 mM imidazole pH 8.0. The column was then washed with the same buffer until baseline stabilization. A linear gradient (0.0-1.0 M imidazole in 20 column volumes) elution was applied in 20 mM Tris-HCl, 500 mM NaCl, 1 M imidazole pH 8.0. Fractions containing the protein were concentrated and loaded onto a Superdex 200 16/90 sizeexclusion chromatography (SEC) column (GE Healthcare) pre-equilibrated in 20 mM Tris-HCl, 500 mM NaCl pH 8.0. The purity of the protein was assessed by SDS-PAGE.
A truncated variant of YdaT lacking the 45 C-terminal residues (YdaT 1-96 ) was created by replacing Ser97 with a stop codon (TAA) via PCR amplification of the whole plasmid using primers 1 and 2 (Table 1, Supplementary Table S1) and Q5 High-Fidelity 2Â Master Mix (NEB). The unmodified plasmid was degraded by incubation with DpnI for 1 h at 37 C. The mutations were confirmed by sequencing and CaCl 2competent E. coli BL21 Star (DE3) cells were transformed with the mutated plasmids. Both proteins were expressed and purified as described for wild-type YdaT except for the use of a Superdex 75 16/60 SEC column (GE Healthcare) for the SEC purification step.

Concentration determination
The concentrations of the samples were determined by measuring the UV absorbance at 280 nm for the proteins and at 260 nm for the oligonucleotides. Extinction coefficients for the proteins were calculated using ProtParam (Gasteiger et al., 2005; https://web.expasy.org/protparam/) and are 19 940 M À1 cm À1 for YdaT and 15 470 M À1 cm À1 for YdaT 1-96 .
For DNA, extinction coefficients were calculated according to Tataurov et al. (2008).

X-ray crystallography
Crystals of YdaT were produced by a sitting-drop method using three-well Intelli-Plates and a Mosquito robot. 100 nl YdaT protein (13.23 mg ml À1 in 20 mM Tris, 200 mM NaCl pH 7.5) was mixed with 100 nl reservoir solution and equilibrated against 70 ml reservoir solution from various commercial crystallization kits at 19 C. Crystals grew in 0.1 M bis-Tris pH 5.5, 2.0 M ammonium sulfate. The crystals were harvested, dipped in cryoprotectant solution [0.1 M bis-Tris pH 5.5, 2.0 M ammonium sulfate, 22.5%(v/v) glycerol] and vitrified in liquid nitrogen.
Diffraction data were collected on beamline PROXIMA-1 at the SOLEIL synchrotron, Gif-sur-Yvette, France and were recorded on an EIGER X 9M photon-counting area detector. The data were indexed, scaled and merged using XDS (Kabsch, 2010). The data were further corrected for anisotropy with the STARANISO server (Tickle et al., 2018) using the 'unmerged data' protocol. The data were limited to 240 total crystal rotation due to radiation damage.
The structure of YdaT was determined by molecular replacement using Phaser as implemented in the CCP4 package (McCoy et al., 2007). The crystal structure of PDB entry 3c4r (sequence identity of 84%; annotated as an uncharacterized protein from a cryptic prophage in E. coli O6:H1; New York SGX Research Center for Structural Genomics, unpublished work) was used as the search model. The initial solution was refined with phenix.refine using a maximum-likelihood target against intensities (Afonine et al., 2012) and was rebuilt with Coot (Emsley et al., 2010). NCS restraints were applied throughout the refinement procedure and the final cycles involved refinement of TLS parameters. Data-collection and refinement statistics are summarized in Table 2 and full details are given in Supplementary Tables S2  and S3.

Circular-dichroism spectroscopy
Measurements were performed on a Jasco J-750 spectrophotometer (Jasco, Japan). Spectra were collected from 200 to 250 nm every 1 nm, with a bandwith of 1 nm, a scan rate of Table 1 Macromolecule-production information.
where is the ellipticity (raw data), c is the molar concentration and l is the optical path length. The buffer spectrum was subtracted from the sample spectra. Protein solutions were prepared in 20 mM Tris-HCl, 500 mM NaCl pH 8.0 buffer.

Size-exclusion chromatography and multi-angle light scattering (SEC-MALS)
YdaT was dialysed overnight against 20 mM Tris-HCl, 200 mM NaCl pH 7.5. The buffer was filtered through a 0.1 mm filter (Sartorius) three times before measurements. Protein samples were centrifuged at 17 000g for 10 min prior to injection. 20 ml of protein sample at a concentration of 1 or 10 mg ml À1 was injected into a Shodex KW402.5-4F column (Showa Denko K. K.) that had been pre-equilibrated with dialysis buffer on an Alliance e2695 XE HPLC System (Waters) connected to a TREOS II light-scattering detector (Wyatt Technology) and a Shodex RI-501 refractive-index detector (Showa Denko K. K.). The flow rate was set to 0.2 ml min À1 . Data processing and molecular-weight calculations were performed using the ASTRA V software (Wyatt Technology). Bovine serum albumin at a concentration of 1 mg ml À1 was used as a standard for calibration.

Native mass spectrometry
YdaT was buffer-exchanged into 200 mM ammonium acetate pH 8 using Amicon Ultra 0.5 ml centrifugal filters (Merck Millipore) with a molecular-weight cutoff of 3 kDa. The concentrations of protein and oligonucleotide alone were 2.5 mM (tetramer concentration) and 5 mM (DNA duplex), respectively. The complexes between YdaT and oligonucleotide were prepared at different YdaT tetramer:DNA molar ratios (0.25:1, 0.25:1.5, 0.5:1, 0.75:1, 1:1, 1.25:1 and 1.5:1), keeping the protein concentration fixed at 2.5 mM YdaT tetramer. Native mass spectrometry was performed on a Synapt G2 mass spectrometer (Waters). The samples were introduced into the gas phase through nano-electrospray ionization with in-house-prepared gold-coated borosilicate glass capillaries. The settings were optimized for the analysis of larger structures as natively as possible. The critical voltages and pressures used were a sampling cone voltage of 50 V and a trap collision energy of 10 V, with pressures throughout the instrument of 6.18 and 2.42 Â 10 À2 mbar for the source and trap collision cell regions, respectively. Analysis of the acquired spectra was performed using MassLynx version 4.1 (Waters). Native MS spectra were smoothed (to an extent depending on the size of the complexes) and additionally centred to calculate the molecular weights to determine precise stoichiometries.

Isothermal titration calorimetry
Oligonucleotides were purchased from Sigma and were HPLC-purified. Double-stranded DNA fragments were prepared by mixing single-stranded oligonucleotides corresponding to the upper and lower strands of the operator DNA in a 1:1 molar ratio followed by incubation at 95 C for 5 min in a water bath. They were subsequently left to cool to room temperature. The annealing was checked by native PAGE. Proteins and oligonucleotides were dialysed against 10 mM NaH 2 PO 4 , 10 mM Na 2 HPO 4 , 100 mM NaCl, 50 mM glutamic acid, 50 mM arginine pH 7.5 with two buffer changes followed by overnight dialysis. Prior to measurements, the samples were spun down at 13 300 rev min À1 for 10 min and degassed on a degassing station (TA Instruments) for 15 min.
The titrations were carried out at 25 C. The concentrations of the oligonucleotides in the sample cell were 5 mM. The concentration of YdaT in the syringe ranged between 25 and 140 mM calculated for YdaT as a tetramer. When O M was titrated with YdaT, the concentration in the cell was 5 mM and that in the syringe was 100 mM. The measurements were performed on a VP-ITC microcalorimeter (MicroCal) or a MicroCal PEAQ-ITC microcalorimeter (Malvern Panalytical). The integration of thermograms was performed with NITPIC (Scheuermann & Brautigam, 2015).
For the binding of YdaT to O M , we assumed that YdaT contains two independent non-equivalent binding sites for O M . The corresponding binding reaction can be represented as For the titrations of YdaT against O LM and O MR an additional binding step was included in the mechanism, since YdaT can also bind to the second site on the DNA, and The values of the equilibrium constants K 1 , K 2 and K 3 and the corresponding enthalpies of reaction were obtained by fitting the appropriate model equation to the ITC data similarly to as described previously (Vandervelde et al., 2017). A system of mass-balance equations was derived given the above reaction schemes. Given the total concentrations of reactants (YdaT and oligonucleotides) and the assumed values of the constants K i , we calculate the equilibrium concentrations of all molecular species using a root-solving routine. By calculating the composition of the experimental system at each point during titration we then calculate the model-based value of the enthalpy as where ÁH i is the enthalpy of formation of the complex i, (@n i /@n tit ) p,T is the corresponding partial derivative at the given pressure p and temperature T in which n i is the amount of complex i and n tit is the amount of added titrant (YdaT or oligonucleotide, depending on the titration). The parameters K i and H i were adjusted using the Nelder-Mead optimization algorithm to produce the best match between the modelcalculated and experimental values of ÁH. The interactions of YdaT with O M in both direct (protein to DNA) and reverse titrations were fitted simultaneously (global fit), while for interactions of YdaT with O LM or O MR only direct titrations were used for fitting.

Electrophoretic mobility shift assay (EMSA)
For radioactive EMSA, a 206 bp segment of DNA was used that corresponds to the end of the ydaS gene and the start of the ydaT gene. The DNA was generated by PCR using primers 5 and 6 (Supplementary Table S1) and genomic DNA of E. coli O157:H7 strain EDL933 as a template. One of the primers was labelled with -32 -P-ATP using T4 polynucleotide kinase. The PCR fragment was purified from polyacrylamide gel and concentrated by ethanol precipitation. In the same way, a nonspecific DNA was generated from genomic DNA of Cupriavidus metallidurans NA4 containing the promotor region of prsQ2. Protein and DNA were mixed in 20 mM Tris, 200 mM NaCl pH 7.5 in a total volume of 20 ml at different protein concentrations and left to incubate for 30 min at room temperature. The protein concentrations (tetramer equivalents) were in the range 2-30 mM. DNA with 7500 cpm was used in each reaction. The samples were then loaded onto a 6% nondenaturing polyacrylamide gel using 2 ml loading dye (25% Ficoll, 0.1% xylene cyanol, 0.1% bromophenol blue).

DNase I footprinting
For DNase I footprinting, a 236 bp segment of DNA corresponding to the end of the ydaS gene and the start of the ydaT gene was generated by PCR using primers 5 and 7 (Supplementary Table S1) as described above with genomic DNA of E. coli O157:H7 strain EDL933 as a template. One of the primers was labelled with -32 P-ATP using T4 polynucleotide kinase. The PCR fragment was purified from polyacrylamide gel and concentrated by ethanol precipitation. The protein solution was mixed with labelled DNA at different concentrations of the protein for 30 min in 20 mM Tris, 200 mM NaCl pH 7.5 in a total volume of 20 ml before adding 0.2 ml DNase I. The reaction was stopped by the addition of 12.5 ml 3 M ammonium acetate, 0.25 M EDTA. The DNA was ethanol precipitated (3 ml of 3 mg ml À1 yeast tRNA was added to the precipitation solution for higher DNA recovery) and analysed on a 6% denaturing polyacrylamide gel using formamide dye (0.03% xylene cyanol, 0.03% bromophenol blue, 20 mM EDTA dissolved in formamide) as a loading buffer. Reference sequencing ladders were prepared from the same 32 P-ATP labelled DNA using citrate or hydrazine following the Maxam-Gilbert sequencing method (Maxam & Gilbert, 1980).

Small-angle X-ray scattering
Small-angle X-ray scattering (SAXS) data for YdaT and its mutants were collected in HPLC mode on the SWING beamline at the SOLEIL synchrotron, Gif-sur-Yvette, France. The data for YdaT in complex with the O M operator fragment were collected on beamline BM29 at the European Synchrotron Radiation Facility (ESRF), Grenoble, France, also in HPLC mode. Details of data collection and analysis are given in Supplementary Table S4. Samples were dialysed against 20 mM Tris-HCl, 200 mM NaCl pH 8 with three buffer changes, with the last buffer change left to dialyse overnight. The dialysis buffer was filtered through a 0.20 mm HAWP filter. For free YdaT and its mutants, a Shodex KW402.5-4F column (Showa Denko K. K.) was equilibrated with the dialysis buffer and 45 ml sample at 10 mg ml À1 was injected into the column. The sample was run at 0.3 ml min À1 . Prior to measurements, the sample was centrifuged at 13 300 rev min À1 for 10 min. The YdaT-DNA complex for SAXS measurements was prepared by mixing YdaT with an excess of the oligonucleotide O M . The complex was left to incubate for 30 min and then injected onto an ENrich SEC 650 column (Bio-Rad). The peak corresponding to the complex was collected and concentrated to a final volume of 50 ml. The complex was then injected onto a pre-equilibrated AdvanceBio SEC 300 column (Agilent) and run at 0.16 ml min À1 at the beamline. Radiation damage was monitored by evaluating R g values per frame during data collection. The data were normalized to the intensity of the transmitted beam and radially averaged. The contributions of the buffer to the scattering were measured at the beginning of the elution and were subtracted from the scattering of the protein. The data from the SWING beamline were processed with Foxtrot ( All simulations were performed in XPLOR-NIH version 2.49 (Schwieters et al., 2003). Refinement of the YdaT tetramer against the experimental SAXS data was carried out starting from the crystal structure presented in this work. Protons and atoms of the residues that were not resolved in the X-ray structure were added in XPLOR-NIH, followed by minimization of the energy function consisting of the standard geometric (bonds, angles, dihedrals and impropers) and steric (van der Waals) terms. The position of the C-terminal fourhelix bundle was kept fixed and the N-terminal POU domains were treated as rigid-body groups, while the N-and C-terminal tails and the flexible hinges (residues Ser96-Tyr99) were given full torsional degrees of freedom. YdaT 1-96 was refined as a monomer. The POU domain was kept fixed, while the N-terminal purification tag of YdaT 1-96 was allowed to move. The computational protocol comprised an initial simulatedannealing step followed by side-chain energy minimization as described previously (Schwieters et al., 2003;Schwieters & Clore, 2014). In addition to the standard geometric and steric terms, the energy function included a knowledge-based dihedral angle potential and a SAXS energy term incorporating the experimental data (Schwieters & Clore, 2014;Schwieters et al., 2018).
For refinement of the YdaT-DNA complex, the atomic coordinates were taken from a model built from the crystal structure of YdaT solved in this work (PDB entry 8bt1) and that of mouse Oct-4 bound to its operator DNA (PDB entry 6ht5; J. Vahokoski, V. Pogenberg & M. Wilmanns, unpublished work). The position of the C-terminal four-helix bundle was kept fixed and pairs of YdaT DNA-binding domains and their respective DNA double-stranded helix were grouped as rigidbody units, while the N-and C-terminal tails and the flexible hinge (residues Ser96-Tyr99) were given full torsional degrees of freedom. Multiple copies of the molecular system (N = 1-5) were refined simultaneously in order to simulate molecular ensembles of multiple conformers (Schwieters & Clore, 2014).
In each refinement run, 100 structures were calculated and the ten lowest-energy solutions, representing the best agree-ment with the experimental data, were retained for subsequent analysis. The agreement between the experimental and calculated SAXS curves (obtained with the calcSAXS helper program, which is part of the XPLOR-NIH package) was assessed by calculating 2 , where I(q) calc,i and I(q) exp,i are the scattering intensities at a given q for the calculated and experimental SAXS curves, respectively, I(q) exp,i is the experimental error on the corresponding I(q) exp,i value and n is the number of data points defining the experimental SAXS curve.

YdaT belongs to the POU family
YdaT was crystallized and its structure was determined at 2.4 Å resolution to R work = 0.195 and R free = 0.269 (Table 2 and  Supplementary Tables S2 and S3). The crystal contained a tetramer in the asymmetric unit. The X-ray data are highly anisotropic, and despite anisotropy correction using STAR-ANISO the R factors remained relatively high. While stereochemical parameters and density fit are good for chains A and B, they are significantly poorer for chain C and especially for chain D, despite careful model building combined with experimenting with different refinement strategies. The final structure was built for residues Lys2-Leu124. The N-terminal His tag as well as the C-terminal residues Tyr125-His141 are disordered and do not provide interpretable electron density.
The YdaT monomer consists of an N-terminal globular helix-turn-helix (HTH)-containing domain followed by a long 29-residue -helix ( Fig. 2a and Supplementary Fig. S1a). The globular domain is all-, with four longer four-to five-turn -helices (1-4 in Fig. 2a and Supplementary Fig. S1a) followed by a shorter two-turn -helix (5). A DALI search with residues Lys4-Met82 of the YdaT monomer picks up a whole series of DNA-binding proteins that all contain a helixturn-helix structural motif. Next to the obvious YdaT from  E. coli O6:H1 (PDB entry 3c4r; Z-score of 20.1 and 84% sequence identity), the best matches involve POU-domain transcription factors (Table 3). POU domains are conserved domains that are found in eukaryotes, with the acronym referring to three transcription factors (Pit-1, Oct1/2 and Unc-86) in which the domain was first described (Phillips & Luisi, 2000). The first bacterial match is the transcription regulator ClgR from Corynebacterium glutamicum (Russo et al., 2009), while the first toxin-antitoxin-related match is HipB from Shewanella oneidensis (Wen et al., 2014).
A long (14-residue) loop between helices 2 and 3 making up the HTH motif is unique to YdaT and is absent from all other POU-domain structures. In addition, the central -helix 3 (Ala50-Asp63, which is the recognition helix in the HTH motif) of YdaT is longer than the corresponding helix in POUdomain structures such as Pit-1 or Oct-4 (PDB entries 1au7 and 6ht5), and -helix 1 (Glu6-His17) is in a different relative orientation (Figs. 2a and 2b). A BLAST search (Altschul et al., 1997) (Phillips & Luisi, 2000). The long loop between helices 2 and 3 is absent from Oct-4 and the recognition helix 3 is significantly longer in YdaT compared with Oct-4. (d) Tetramer formation though the creation of a four-helix bundle. In two subunits helix 6 is shown as a cartoon, while in the other two subunits this helix is shown as a grey C trace. Side chains that make up the hydrophobic core are shown as sticks. (e) Superposition of the four subunits in the crystal structure of the YdaT tetramer. The tetramerization helices 6 are superimposed, showing the variability in orientation of the corresponding POU domains. The POU domains of chains A and C are coloured green and oriented differently from the POU domains in chains B and D, which are coloured blue. hits with an E-value smaller than 0.1. In these sequences the 2-3 loop varies in length between nine and 23 residues and does not contain any obviously conserved residues (Supplementary Fig. S1b).
The POU domain of YdaT is highly rigid, with r.m.s.d. values between 0.4 and 0.5 Å for all backbone (N, C , C and O) atoms of residues 3-90. Only Asp44-Glu49, which corresponds to the C-terminal part of the long YdaT-specific loop between helices 2 and 3, shows some small variation.

YdaT is a tetramer in solution
YdaT forms a tetramer in the crystal (Fig. 2c). The long C-terminal -helix 6 serves as an oligomerization motif, with four such helices assembling into an antiparallel bundle, burying a total surface of around 2600 Å 2 . A hydrophobic core is formed via the side chains of Ile103, Leu110, Val111, Val114, Phe117, Val118 and Ala121 (Fig. 2d). This C-terminal -helix 6 can bend slightly, changing the direction of its N-terminus. Chains A and C adopt very similar conformations, as do chains B and D. When both pairs are compared and superimposed using the C-terminal helix 6 (residues Tyr100-Ser120), the orientation of the N-terminal domain rotates by almost 30 (Fig. 2e). The loop Ser94-Tyr99 here serves as a hinge region. The POU-like DNA-binding domains themselves are highly similar, with only some small variations in backbone structure in the loop between helices 2 and 3.
To determine the oligomeric state of YdaT in solution, we performed SEC-MALS. The elution profile (injected monomer concentrations of 54.1 and 541.4 mM, corresponding to 1 and 10 mg ml À1 , respectively) shows a single symmetrical peak corresponding to molecular weights of 69.4 or 72.9 kDa, respectively, in close agreement with the theoretical mass of 73.9 kDa for a tetramer (Figs. 3a and 3b). Native massspectrometry measurements confirm this result. At a concentration of 1.0 mM YdaT (monomer equivalents) is primarily a tetramer, but some monomer is also present (Fig. 4a).
The tetramer observed in our crystal structure does not perfectly match 222 symmetry. Because of differences in bending and hinge conformation, the POU domains in chains A and C are significantly further from each other than those in chains B and D (10.9 versus 5.1 Å as their closest distance). Similar relative movements of the POU domains are also observed in the structure of the closely related uncharacterized protein in PDB entry 3c4r. In order to further characterize these inter-domain dynamics in solution, we performed SEC-SAXS (Figs. 5a and 5b). A good fit of the data ( 2 = 1.68) is obtained with an ensemble of ten structures where the N-terminal domains are allowed to move relative to the C-terminal four-helix bundle, confirming the structural variability observed in the crystal.

YdaT recognizes an inverted repeat located between the ydaS and ydaT genes
The structural organization of the prophage CP-933P suggests YdaS and YdaT as the equivalents of the Cro and CII repressors in , a hypothesis that was recently substantiated by  segment covering the end of the ydaS gene and the start of the ydaT gene. EMSA experiments show concentration-dependent binding to this fragment (Fig. 6a). Several bands can be observed with increasing YdaT concentration, indicating multiple binding events. In contrast, binding to the unrelated promotor/operon region of prsQ2 from Cupriavidus metallidurans NA4 requires roughly fourfold higher YdaT concentrations and does not lead to the observation of multiple species.
In order to pinpoint the exact binding site of YdaT, we turned to DNase I footprinting using a slightly longer 236 bp fragment (Fig. 7). At the lowest protein concentration used (0.5 mM), protection was already observed on both strands for a 25 bp region containing a 5 0 -TTGATTN 6 AATCAA-3 0 inverted repeat located at the end of the coding region of the ydaS gene and downstream from the P RE933P promotor that was proposed as an alternative transcription start for the paaR2-paaA2-parE2 operon and which is controlled via YdaT (Jurė nas et al., 2021). This highly protected inverted repeat is referred to as O M (Figs. 1 and 7). At higher protein concentrations, starting from 1.0 mM, this DNase I-protected area extends further on each side, resulting in a total region of protection of approximately 95 nt on both strands that is composed of zones of weaker and stronger protection. The latter contain sequences that show sequence similarity to part of the high-affinity binding site and may represent additional lower affinity binding sites, thus explaining the multiple bands observed in the EMSA experiments. These two incomplete inverted repeats are referred to as O L and O R (Figs. 1 and 7).

Thermodynamics of operator binding
Next, we turned to isothermal titration calorimetry (ITC) to understand how YdaT binds to its operator region. We therefore selected a set of fragments containing different potential binding sites (Table 4, Supplementary Fig. S5). Fragment O M contains the central inverted repeat that was identified as the main binding site for YdaT in the DNase I protection experiment. This fragment binds to YdaT with an affinity in the submicromolar range and two binding events can be discerned that differ in affinity by approximately a factor of four (Table 4, Fig. 8, Supplementary Fig. S5). This indicates that the YdaT tetramer binds two duplexes of O M with apparent negative cooperativity of enthalpic origin.
To further confirm this proposed binding model to O M , we turned to native mass spectrometry (MS). With an excess of O M , all YdaT tetramers are saturated with two such DNA fragments, leading to a species of 110.7 kDa. When YdaT is in excess, on the other hand, YdaT tetramers with both one and  SAXS solution structure of YdaT. (a) Experimental SAXS data of YdaT (black) and the calculated curve (red) obtained from the best-fitting ensemble of ten conformers ( 2 = 1.684). (b) Superimposition of the ten conformers of the YdaT SAXS ensemble (black C traces) onto the YdaT crystal structure (cartoon representation coloured as in Fig. 2b). In the SAXS ensemble, the POU domains adopt different orientations relative to the C-terminal four-helix bundle due to the flexibility of the loop between the C-and N-terminal domains (residues 96-99). Table 4 Thermodynamics of operator binding obtained from ITC.
The K d and ÁH parameters were obtained from fitting the model equations to the ITC data and are reported at T = 298 K. In all cases index 1 refers to the binding of the central inverted repeat to the first YdaT binding site and index 2 refers to the binding of the same repeat from another DNA molecule to the second YdaT binding site. Index 3 refers to the binding of YdaT to the distant incomplete repeat (R or L), but it is not clear whether one or two YdaT binding sites are engaged in binding. The free energy of association ÁG and the entropic contribution TÁS were calculated using standard equations. Standard mean errors are obtained from the fitting procedure or are calculated through error propagation in the case of TÁS and ÁG. K d values are given in mM, while ÁG, ÁH and TÁS values are in kcal mol À1 . it is not possible to reliably determine ÁH 1 and ÁH 2 separately as they are strongly correlated. The same holds for the respective entropic contributions. However, the total sum (ÁH 1 + ÁH 2 ) can reliably be obtained from fitting. Note that the obtained sums are comparable to that for the titrations with O M (ÁH 1 + ÁH 2 = À70.3 kcal mol À1 ), which correspond to the same binding events to the central inverted repeat via the first and second YdaT. ‡ Refers to the sum TÁS 1 + TÁS 2 . § The affinity for the second operator binding site is too weak (K d3 = 53 mM) to reliably determine ÁH 3 . In contrast, fragments containing the possible alternative incomplete inverted-repeat sequences O L or O R did not allow high-quality ITC data to be measured. Both fragments bind only weakly and no reliable thermodynamic parameters could be obtained (Supplementary Fig. S2) Fig. 8). Binding to these low-affinity sites increases the affinity for the main site by a factor of 2-4, indicating an interaction between the different sites. Given the distance between these sites, especially in the case of O LM , this communication is likely to propagate Acta Cryst. DNAse I footprinting. DNase I footprint on a 236 bp operator fragment (bottom strand shown) using increasing concentrations of YdaT (in tetramer equivalents). The nucleotide sequence of the region that becomes protected is shown at the side with an indication of the regions of protection. The sequence of the 95 nt protected region is indicated on the right. At 0.5 mM YdaT, a 25 bp zone of protection (dark grey background) containing the inverted repeat 5 0 -TTGATTN 6 AATCAA-3 0 (bold) can already be observed. At higher protein concentrations this zone is extended further on both sides of the primary binding site, resulting in a total footprint of 94-97 nt on both strands that consists of zones of stronger (light grey background) and weaker (white background) protection. The former overlap with two imperfect inverted repeats (bold). These inverted repeats are labelled O L , O M and O R as described in the text.   through the DNA rather than through direct contacts between two bound YdaT tetramers.

SAXS model of the YdaT-operator complex
The four DNA-binding domains of YdaT are oriented such that two surfaces are generated that face away from each other, and in principle each can accommodate an $30 bp DNA duplex. This is in agreement with the binding stoichiometry that is obtained from ITC and native MS. Using the structure of mouse octamer-binding protein 4 (Oct-4) in complex with a 21 bp DNA duplex (PDB entry 6ht5) as a guide, we built a model of YdaT bound to a 30 bp B-DNA duplex containing the O M sequence identical to that used for ITC (O M in Supplementary Table S4). Next, this model was validated using SAXS ( Fig. 9 and Supplementary Table S4). The molecular weight determined from the SAXS data is about 102 kDa, which is close to the theoretical molecular weight of 110.7 kDa for a complex consisting of one YdaT tetramer and two O M molecules. The central four-helix bundle was fixed, while the POU domains were allowed to reorient while remaining docked onto the DNA through variation of the hinge loop Ser96-Tyr99. The N-terminal His tag and the C-terminal 20 amino acids remain highly flexible. As a consequence, pairs of two POU domains (chains A and C or chains B and D) remain locked together via the bridging DNA molecule and their movements relative to the C-terminal helix 6 become highly restricted compared with those observed in the free structure (Figs. 9a and 9b). The best fit to the experimental data ( 2 = 1.418) was obtained with ten ensembles each containing two conformers. More conformers did not improve the fit.
Compared with the crystal structure of the free form of YdaT, no conformational changes are required in YdaT except for some rigid-body movement of the different POU domains relative to each other and the reorientation of a few side chains. In particular, the A-C pair of POU domains are oriented in our crystal structure of the free state such that they cannot correctly bind together to the same DNA duplex.
The residues of the Leu35-Glu49 loop fold over the ribosephosphate backbone of one DNA strand, while the recognition helix 3 (Ala50-Asp63) of the HTH motif docks into the major groove of the DNA (Fig. 9c). Arg60 is nicely positioned to make base-specific hydrogen bonds to Gua21. This arginine is highly conserved in all available POU-domain structures, where it makes similar contacts. It is also conserved in all sequences of YdaT homologs picked up by our BLAST search. Interestingly, in some of these sequences an insertion of a single amino acid between Phe59 and Arg60 is observed that is predicted by AlphaFold2 to result in a single turn of -helix to allow the side chain of Arg60 to remain in the correct orientation and position. The side chain of Arg53 is likely to form a hydrogen bond to the neighbouring backbone phosphate group of Thy10. Residues of the His17-Gly20 loop together with the N-terminus of helix 2 (Glu21-Glu34) are in contact with the other DNA strand and a hydrogen bond between the backbone amide of Glu21 and an O atom from the ribose backbone is likely. This residue is also not conserved in the eukaryotic POU domains or in the bacterial YdaT homologs.

Oligomerization of YdaT is required for DNA binding
In order to understand the role of oligomerization in DNA binding, we created a mutant in which YdaT is truncated after His96 (YdaT 1-96 ). This corresponds to a protein consisting of the N-terminal domain but lacking the C-terminal oligomerization helix 6. YdaT 1-96 appears to be a well folded species in solution with a predominantly -helical structure (Supplementary Fig. S4a), in agreement with the conformation of this domain in the crystal structure of the full-length protein. SEC-MALS and SAXS further support this conclusion (Supplementary Table S4, Fig. 3c, Supplementary Figs. S5a and S5b). No measurable interaction between YdaT 1-96 and the YdaT operator DNA was detected in EMSA experiments (Fig. 6b), indicating that a monomeric POU domain is insufficient for effective DNA binding. The molecular weights determined from SEC-MALS and SAXS analysis are 11.1 and $12 kDa, respectively, and are in close agreement with the theoretical molecular weight of 13.3 kDa for a monomeric species. The theoretical scattering curve calculated for the ensemble of ten conformers fits the experimental curve well ( 2 = 0.983).

Discussion
YdaT was originally identified as one of two potential transcripts encoded by the ydaST operon in E. coli OH157:H7. This operon was originally suspected to encode a toxin-antitoxin pair (Sevin & Barloy-Hubler, 2007), but was recently shown to be part of the cryptic prophage CP-933P, where the cognate proteins function as equivalents of Cro and CII (Jurė nas et al., 2021). The cryptic prophage CP-933P has lost its ability to enter a lytic cycle. Its immunity region contains the paaA2-parE2 toxin-antitoxin gene pair that replaces the rexA and rexB genes and is preceded by the gene for the PaaR2 regulator that replaces CI.
CP-933P YdaT is a representative of a family of transcription regulators found in lambdoid phages and is functionally but not structurally related to CII. The DNA-binding domain of YdaT is a POU domain, with an unusually long loop between the two helices of the HTH motif (2 and 3) as a defining structural feature. A sequence alignment of YdaT homologs shows that this long loop varies in length and sequence within the YdaT family ( Supplementary Fig. S1b) and does not contain obvious conserved residues, despite being likely involved in operator binding. Equally, the recognition helix 3 contains only a single residue that is fully conserved in the YdaT family, Arg60, which according to our model is likely to be essential for operator binding. The evolutionary pressure that drives this variability even though the protein is still functional as a repressor in a cryptic phage such as CP-933P (Jurė nas et al., 2021) is unclear. It is possible that it relates to a requirement to stably maintain cryptic prophages or segments thereof in the genome, similar to toxin-antitoxin pairs of the same family that are only found together on the same chromosome if they do not interact (Goeders & Van Melderen, 2014 and references therein).
YdaT of the cryptic prophage CP-933P is functional as a DNA-binding protein, as illustrated by our EMSA, ITC and DNAse I protection experiments as well as by previously reported in vivo data (Jurė nas et al., 2021). This functionality requires oligomerization. YdaT is a symmetric tetramer with two oppositely positioned sets of DNA-binding sites, meaning that it can recognize two 30 bp operator sequences simultaneously. Yet, the CP-933P prophage only contains a single strong binding site, possibly leaving one pair of POU domains unbound. Alternatively, YdaT has the potential to stabilize a DNA loop. However, the distance between the strong main binding site and the two flanking potential secondary sites is too short to allow the formation of a loop. Indeed, weak binding to these sites seems to occur independent of binding to the main site.
CII, the equivalent of YdaT in , is also a tetramer, but rather than forming a closed point group has an 'unusual dimer-of-dimers' architecture in which two of the subunits (each from a different dimer) are correctly positioned relative to each other to recognize a direct repeat on the operator (Datta et al., 2005;Jain et al., 2005). The other two monomers form a bridge between the 'active' subunits but are not themselves involved in DNA binding. In this architecture the two DNA-binding subunits are oriented in the same direction, as would be required for recognition of a direct repeat. YdaT, on the other hand, forms a more classic type of tetramer with internal 222 point-group symmetry and is therefore suited to recognize an inverted repeat.